Skip to content

feat: Support Spark expression window_time#3732

Open
0lai0 wants to merge 6 commits intoapache:mainfrom
0lai0:expression_window_time
Open

feat: Support Spark expression window_time#3732
0lai0 wants to merge 6 commits intoapache:mainfrom
0lai0:expression_window_time

Conversation

@0lai0
Copy link
Contributor

@0lai0 0lai0 commented Mar 19, 2026

Which issue does this PR close?

Closes #3138

Rationale for this change

Comet previously did not support the Spark PreciseTimestampConversion expression.
This expression is not called directly by users, but is generated internally by Spark's Analyzer when it rewrites window_time(window_column) into a combination of GetStructField, Subtract, and PreciseTimestampConversion. Since Comet already supports GetStructField and Subtract but not PreciseTimestampConversion, queries using window_time would fall back to JVM execution.

What changes are included in this PR?

This change adds a Serde handler for PreciseTimestampConversion that reuses the existing Cast protobuf and Rust implementation. Since TimestampType (microseconds) and LongType share the same 64-bit memory layout in Arrow, a simple cast suffices without any new native code paths.

  • datetime.scala : Added CometPreciseTimestampConversion handler
  • QueryPlanSerde.scala : Registered the handler in temporalExpressions map
  • window_time.sql : Added window_time test

How are these changes tested?

Added a Scala unit test for window_time expression support in CometExpressionSuite

./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite" -Dscala.useZincServer=false

@0lai0 0lai0 changed the title feat: Support Spark expression window time feat: Support Spark expression window_time Mar 19, 2026
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @0lai0
Triggering CI, please have SQL tests in separate sql files, similar to array_append.sql

"(cast('2023-01-01 12:15:00' as timestamp), 3)")

// basic window_time with aggregation
checkSparkAnswer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkSparkAnswer isn't checking that the query is actually running in Comet. However, it would be better to add the SQL based tests instead as @comphead pointed out

@0lai0
Copy link
Contributor Author

0lai0 commented Mar 20, 2026

Thanks @comphead and @andygrove for review and point this out. I'll add SQL tests in separate sql files and remove previous test. By the way, I discovered that using setcast is problematic, so I have updated it accordingly.

INSERT INTO test_window_time VALUES (timestamp('2023-01-01 12:00:00'), 1), (timestamp('2023-01-01 12:05:00'), 2), (timestamp('2023-01-01 12:15:00'), 3)

-- basic window_time with tumbling window
query spark_answer_only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both queries use spark_answer_only. Is there a reason these can't run with the default mode that verifies native execution? If there's another operator in the plan that causes fallback, could you note which one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove for review.
While this PR supports native execution, Spark's window() introduces unsupported ops that trigger a full fallback. Therefore, we temporarily use spark_answer_only for result verification. I'll add an inline comment to document this.
If I have misunderstood, please correct me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, windowed aggregates are marked as incompatible. Maybe you can enable in the test with spark.comet.operator.WindowExec.allowIncompatible=true? You may need to check exact config key but it should be something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove . I dug into this and tried enabling the config spark.comet.operator.WindowExec.allowIncompatible=true, but the test still fails if I remove spark_answer_only.

The root cause might be at the expression level: Spark's TimeWindowing analyzer rule automatically expands window() into a CreateNamedStruct wrapped inside a KnownNullable expression. Since Comet doesn't currently support KnownNullable natively, the planner is forced to fall back to Spark anyway.
https://github.com/search?q=repo%3Aapache%2Fspark+KnownNullable&type=code

Fully supporting native struct creation and KnownNullable would be a broader effort outside the scope of this PR (which focuses on PreciseTimestampConversion issue for window_time).

Because of this, it might be better to keep spark_answer_only here and open a follow-up issue for those specific expressions. I'll make sure to update the inline comment to explicitly document the exact cause of the fallback for future reference.
Let me know your thoughts on this.

CREATE TABLE test_window_time(time timestamp, value int) USING parquet

statement
INSERT INTO test_window_time VALUES (timestamp('2023-01-01 12:00:00'), 1), (timestamp('2023-01-01 12:05:00'), 2), (timestamp('2023-01-01 12:15:00'), 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a row with a null timestamp to the test data? Something like (NULL, 4)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update this.Thanks.

Comment on lines +604 to +605
// Both types are i64 micros in Arrow, so no conversion needed — return child directly.
exprToProtoInternal(expr.child, inputs, binding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@andygrove
Copy link
Member

Thanks @0lai0 this is looking good so far. Could you also update the user docs to add this to the list of supported expressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: window_time

3 participants